Web Table Classification Based on Visual Features
نویسندگان
چکیده
Tables on the web constitute a valuable data source for many applications, like factual search and knowledge base augmentation. However, as genuine tables containing relational only account small proportion of web, reliable table classification is crucial first step extraction. Previous works usually rely explicit feature construction from HTML code. In contrast, we propose an approach by exploiting full visual appearance table, which purely applying convolutional neural network rendered image table. Since these features can be extracted automatically, our circumvents need construction. A new hand labeled gold standard dataset code images 13,112 was generated this task. Transfer learning techniques are applied to well known VGG16 ResNet50 architectures. The evaluation CNN with fine tuned (F1 93.29%) shows that achieves results comparable previous solutions using explicitly defined based features. By combining features, F-measure 93.70% achieved Random Forest classification, beats current state art methods.
منابع مشابه
Classification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملClassification of News Web Documents Based on Structural Features
The motivation of this work comes from the need of a Thai web corpus for testing our information retrieval algorithm. Two collections of news web documents are gathered from two different Thai newspaper web sites. Our goal is to find a simple yet effective method to extract news articles from these web collections. We explore the use of machine learning methods to distinguish article pages from...
متن کاملHyperspectral Images Classification by Combination of Spatial Features Based on Local Surface Fitting and Spectral Features
Hyperspectral sensors are important tools in monitoring the phenomena of the Earth due to the acquisition of a large number of spectral bands. Hyperspectral image classification is one of the most important fields of hyperspectral data processing, and so far there have been many attempts to increase its accuracy. Spatial features are important due to their ability to increase classification acc...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کامل3D Classification of Urban Features Based on Integration of Structural and Spectral Information from UAV Imagery
Three-dimensional classification of urban features is one of the important tools for urban management and the basis of many analyzes in photogrammetry and remote sensing. Therefore, it is applied in many applications such as planning, urban management and disaster management. In this study, dense point clouds extracted from dense image matching is applied for classification in urban areas. Appl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-74296-6_15